#3 XCiT: Cross-Covariance Image Transformers (Facebook AI)

Date: June 23, 2021

After dominating Natural Language Processing, Transformers have taken over Computer Vision recently with the advent of Vision Transformers. However, the attention mechanism’s quadratic complexity in the number of tokens means that Transformers do not scale well to high-resolution images. XCiT is a new Transformer architecture, containing XCA, a transposed version of attention, reducing the complexity from quadratic to linear, and at least on image data, it appears to perform on par with other models. What does this mean for the field? Is this even a transformer? What really matters in deep learning? Link for the video

Share on

Twitter Facebook LinkedIn

Sahil Khose

Share on